Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data
نویسندگان
چکیده
UNLABELLED White spruce (Picea glauca) is a dominant conifer of the boreal forests of North America, and providing genomics resources for this commercially valuable tree will help improve forest management and conservation efforts. Sequencing and assembling the large and highly repetitive spruce genome though pushes the boundaries of the current technology. Here, we describe a whole-genome shotgun sequencing strategy using two Illumina sequencing platforms and an assembly approach using the ABySS software. We report a 20.8 giga base pairs draft genome in 4.9 million scaffolds, with a scaffold N50 of 20,356 bp. We demonstrate how recent improvements in the sequencing technology, especially increasing read lengths and paired end reads from longer fragments have a major impact on the assembly contiguity. We also note that scalable bioinformatics tools are instrumental in providing rapid draft assemblies. AVAILABILITY The Picea glauca genome sequencing and assembly data are available through NCBI (Accession#: ALWZ0100000000 PID: PRJNA83435). http://www.ncbi.nlm.nih.gov/bioproject/83435.
منابع مشابه
Organellar Genomes of White Spruce (Picea glauca): Assembly and Annotation.
The genome sequences of the plastid and mitochondrion of white spruce (Picea glauca) were assembled from whole-genome shotgun sequencing data using ABySS. The sequencing data contained reads from both the nuclear and organellar genomes, and reads of the organellar genomes were abundant in the data as each cell harbors hundreds of mitochondria and plastids. Hence, assembly of the 123-kb plastid ...
متن کاملImproved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism.
White spruce (Picea glauca), a gymnosperm tree, has been established as one of the models for conifer genomics. We describe the draft genome assemblies of two white spruce genotypes, PG29 and WS77111, innovative tools for the assembly of very large genomes, and the conifer genomics resources developed in this process. The two white spruce genotypes originate from distant geographic regions of w...
متن کاملntCard: a streaming algorithm for cardinality estimation in genomics data
Motivation Many bioinformatics algorithms are designed for the analysis of sequences of some uniform length, conventionally referred to as k -mers. These include de Bruijn graph assembly methods and sequence alignment tools. An efficient algorithm to enumerate the number of unique k -mers, or even better, to build a histogram of k -mer frequencies would be desirable for these tools and their do...
متن کاملAssembly of an early-matured japonica (Geng) rice genome, Suijing18, based on PacBio and Illumina sequencing
The early-matured japonica (Geng) rice variety, Suijing18 (SJ18), carries multiple elite traits including durable blast resistance, good grain quality, and high yield. Using PacBio SMRT technology, we produced over 25 Gb of long-read sequencing raw data from SJ18 with a coverage of 62×. Using Illumina paired-end whole-genome shotgun sequencing technology, we generated 59 Gb of short-read sequen...
متن کاملGene mapping in white spruce (P. glauca): QTL and association studies integrating population and expression data
Background Connecting phenotype with genotype is the basis for developing forest genetic applications such as marker assisted selection (MAS). Quantitative Trait Locus (QTL) mapping and genetic association mapping (or linkage disequilibrium (LD) are two major approaches to find genes that control phenotypes of interest in forest trees. Quantitative trait loci (QTL) and association mapping exper...
متن کامل